A Data Complexity Approach to Kernel Selection for Support Vector Machines
نویسندگان
چکیده
We describe a data complexity approach to kernel selection based on the behavior of polynomial and Gaussian kernels. Our results show how the use of a Gaussian kernel produces a gram matrix with useful local information that has no equivalent counterpart in polynomial kernels. By exploiting neighborhood information embedded by data complexity measures, we are able to carry out a form of meta-generalization. Our goal is to predict which data sets are more favorable to particular kernels (Gaussian or polynomial). The end result is a framework to improve the model selection process in Support Vector Machines. Introduction Kernel methods have gained increased popularity in the machine learning community in recent years; one key strength underlying these methods is the ability to map the original training points into a higher dimensional feature space, thereby facilitating the job of a low capacity (linear) learning machine. Mapping points into high dimensional spaces has proved an effective strategy for classification in learning algorithms like Support Vector Machines (SVM), mainly because the mapping does not imply an increase in the complexity of the classifier, and because it is possible to rely on an artifice known as the kernel trick, that obviates a precise definition of the mapping functions. The recent success of kernel classification methods has prompted the design of several techniques that aim to improve performance; but only a few studies have focused on understanding the role that the kernel function plays in them. Such information is stored in the gram matrix, or kernel matrix. A kernel matrix has size n× n, where n represents the number of elements in the data set; each entry stands as the dot product of a pair of training elements mapped into a high dimensional space. Past work proposes extracting properties from this matrix to perform kernel selection (Chen et al. 2006; You, Hamsici, and Martinez 2011), using three main metrics: Fisher’s Discriminant, Bregman’s Divergence, and Homoscedasticity. Our study focuses on two of the most used kernel functions: the polynomial and the Gaussian kernels. We analyze Copyright c © 2014, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. their behavior to extract information that can be instrumental during the kernel selection process within SVM; our strategy is to attend to the relationship between kernel behavior and data complexity measures (Ho and Basu 2002), and to capture such relationship in a simple decision tree model.
منابع مشابه
Separating Well Log Data to Train Support Vector Machines for Lithology Prediction in a Heterogeneous Carbonate Reservoir
The prediction of lithology is necessary in all areas of petroleum engineering. This means that to design a project in any branch of petroleum engineering, the lithology must be well known. Support vector machines (SVM’s) use an analytical approach to classification based on statistical learning theory, the principles of structural risk minimization, and empirical risk minimization. In this res...
متن کاملDevelopment of a Pharmacogenomics Model based on Support Vector Regression with Optimal Features Selection Approach to Determine the Initial Therapeutic Dose of Warfarin Anticoagulant Drug
Introduction: Using artificial intelligence tools in pharmacogenomics is one of the latest bioinformatics research fields. One of the most important drugs that determining its initial therapeutic dose is difficult is the anticoagulant warfarin. Warfarin is an oral anticoagulant that, due to its narrow therapeutic window and complex interrelationships of individual factors, the selection of its ...
متن کاملRemote Sensing and Land Use Extraction for Kernel Functions Analysis by Support Vector Machines with ASTER Multispectral Imagery
Land use is being considered as an element in determining land change studies, environmental planning and natural resource applications. The Earth’s surface Study by remote sensing has many benefits such as, continuous acquisition of data, broad regional coverage, cost effective data, map accurate data, and large archives of historical data. To study land use / cover, remote sensing as an effic...
متن کاملDevelopment of a Pharmacogenomics Model based on Support Vector Regression with Optimal Features Selection Approach to Determine the Initial Therapeutic Dose of Warfarin Anticoagulant Drug
Introduction: Using artificial intelligence tools in pharmacogenomics is one of the latest bioinformatics research fields. One of the most important drugs that determining its initial therapeutic dose is difficult is the anticoagulant warfarin. Warfarin is an oral anticoagulant that, due to its narrow therapeutic window and complex interrelationships of individual factors, the selection of its ...
متن کاملA comparative study of performance of K-nearest neighbors and support vector machines for classification of groundwater
The aim of this work is to examine the feasibilities of the support vector machines (SVMs) and K-nearest neighbor (K-NN) classifier methods for the classification of an aquifer in the Khuzestan Province, Iran. For this purpose, 17 groundwater quality variables including EC, TDS, turbidity, pH, total hardness, Ca, Mg, total alkalinity, sulfate, nitrate, nitrite, fluoride, phosphate, Fe, Mn, Cu, ...
متن کاملApplication of Artificial Neural Networks and Support Vector Machines for carbonate pores size estimation from 3D seismic data
This paper proposes a method for the prediction of pore size values in hydrocarbon reservoirs using 3D seismic data. To this end, an actual carbonate oil field in the south-western part ofIranwas selected. Taking real geological conditions into account, different models of reservoir were constructed for a range of viable pore size values. Seismic surveying was performed next on these models. F...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014